Skip to content

Fix slugify reporter variations #5997

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

quevon24
Copy link
Member

@quevon24 quevon24 commented Jul 17, 2025

Instead of solving this in get_canonicals_from_reporter(), I fixed it directly in slugify_reporter().

Why? Because slugify_reporter() is the first and central place where reporter input is normalized. By adding a final fallback check for known variations (using VARIATIONS_ONLY), we can cleanly disambiguate most cases before they hit more complex logic later.

What works now
Here are cases that now redirect correctly from the variation to the intended slug:

Input Variant Redirected Slug
Vr. vroom
V.R. vt
Black Rep. black
Black. Rep. blackf
Cal. App. 2d Supp cal-app-2d
Cal. App. 2d Supp. cal-app-supp-2d
CLR conn-l-rptr
Cl.R. cl-ch
Dec. Commr. Pat. dec-com-pat
Dec. Comm'r Pat. dec-commr-pat
Hayw. & H. hayw-hdc
Hayw.& H. hay-haz
Johns.(N.Y.) johns-ch
Johns.N.Y. johns
Mt. mont
mt mt
Pa.C. pa-commw
Pac. p
Sc. scam

What still can't be disambiguated
Some abbreviations are too ambiguous to resolve safely. For example:

"B.R." is used both as a standalone reporter for Bankruptcy Reporter and as a variation for Baltimore City Reports ("Balt. C. Rep."). When we receive "B.R.", it exists in REPORTERS, so the function assumes it's the Bankruptcy Reporter and returns its slug "br". But this could be incorrect, the user might've meant "Balt. C. Rep." (slug "balt-c-rep"), which also uses "B.R." as a variation.

Because we can't safely guess the intent, we continue suggesting an alternative reporter:

image

Or continue showing a page with HTTP 300 like this if disambiguation is not possible:

image

Here's the list of cases that cannot be automatically disambiguated:

Input Variant Canonical(s) Status Why?
B.R. br, balt-c-rep 300 "B.R." is both a reporter and a variation of another
BR br, balt-c-rep 300 Same as above, unpunctuated version
Wash. wash, wash-terr 300 Could refer to Washington Reports or Washington Territory Reports
WASH wash, wash-terr 300 Uppercase version, same ambiguity
HOW how, howard 300 Could be Howard's Supreme Court Reports Howard's Reports or Mississippi Reports, Howard.
How. how, howard 300 Same as above
OKla. okla, okla-crim 300 Could be Oklahoma Criminal Reports or Oklahoma Reports
Okla. okla, okla-crim 300 Same as above
S.C. s-ct, scam 300 Could be West's Supreme Court Reporter or South Carolina Reports

@quevon24 quevon24 marked this pull request as ready for review July 18, 2025 00:56
@quevon24 quevon24 assigned quevon24 and flooie and unassigned quevon24 Jul 18, 2025
@quevon24 quevon24 moved this to PRs to Review in Case Law Sprint Jul 18, 2025
Copy link
Contributor

@flooie flooie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to do It here.
but I wonder if we need to do some cleanup for some of those courts-db slugs

@flooie flooie enabled auto-merge July 28, 2025 14:26
@flooie flooie merged commit 25730ba into main Jul 28, 2025
9 checks passed
@flooie flooie deleted the 5389-slugifying-variations-in-get_canonicals_from_reporter-discards-some-data branch July 28, 2025 14:32
@github-project-automation github-project-automation bot moved this from PRs to Review to Done in Case Law Sprint Jul 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Slugifying variations in get_canonicals_from_reporter discards some data
2 participants